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Abstract. When an i.i.d. sequence of letters is cut into words according to i.i.d. renewal 
times, an i.i.d. sequence of words is obtained. In the annealed LDP (large deviation principle) 
for the empirical process of words, the rate function is the specific relative entropy of the 
observed law of words w.r.t. the reference law of words. In Birkner, Greven and den Hollan- 
der [3] the quenched LDP (= conditional on a typical letter sequence) was derived for the case 
where the renewal times have an algebraic tail. The rate function turned out to be a sum of 
two terms, one being the annealed rate function, the other being proportional to the specific 
relative entropy of the observed law of letters w.r.t. the reference law of letters, obtained 
by concatenating the words and randomising the location of the origin. The proportionality 
constant equals the tail exponent of the renewal process. 

The purpose of the present paper is to extend both LDP's to letter sequences that are 
not i.i.d. It is shown that both LDP's carry over when the letter sequence satisfies a mixing 
condition called summable variation. The rate functions are again given by specific relative 
entropies w.r.t. the reference law of words, respectively, letters. But since neither of these 
reference laws is i.i.d., several approximation arguments are needed to obtain the extension. 
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1. Introduction and main results 

1.1. Notation. Let Ebea finite set of letters and E = Ll£^E e the set of finite words drawn 
from E. Write E 1 ' and E 1, for the sets of two-sided sequences of letters and words, and let 
9 and 9 denote the left-shifts acting on these sets, respectively. The set of probability laws 
on E 1 " and E z that are shift-invariant, respectively shift-invariant and ergodic w.r.t. 9 and 9 
are denoted by T inv {E 1 ') and P imr (E L ), respectively V inv ' CIg {E 1 ') and ^.erg^gz^ and are 
endowed with the topology of weak convergence. 

Let X = (Xk)kez be a two-sided random sequence of letters sampled according to a shift- 
invariant probability distribution v on E^ . Let r = be a two-sided i.i.d. sequence of 
renewal times drawn from a common probability law g on N, independent of X. The latter 
form a renewal process T = (Tj)i G z given by 

T = 0, T i = T i - 1 + T i , ieZ. (1.1) 

Let Y = (li)iez be the two-sided random sequence of words cut out from X according to r, 
i.e., 

Yi = X {Ti _ lM = (X Ti _ 1+1 , ...,X Ti ), ieZ. (1.2) 
The joint law of X and r is denoted by P. Write \Yi\ to denote the length of word i. 

The reverse of cutting is glueing. The concatenation operator k: E z — > E z glues a word 
sequence into a letter sequence. In particular, k(Y) = X. Given Q G J> mv (E 1 ') with mg = 
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EqUYxD < oo, let f Q G p hn (E z ) be defined by 



/ini-i 

= ^~ Eq E V«CneA} , ^C£ z , (1.3) 



m <2 



fc=0 



i.e., the law of when V is drawn from Q, turned into a stationary law by randomizing 

the location of the origin. 

For n G N, let (l?o,n]) per £ ^ Z denote the n-periodized version of y. We are interested in 
the empirical distribution of words 



71-1 



= ~E^(y (0 ,„,)p cr ' ( L4 ) 

2=0 



both under P (= annealed law) and under P(- | X) for v-a.a. X (= quenched law). 

1.2. Large deviation principles. If v is i.i.d., then P is i.i.d. and the annealed LDP is 
standard, with the rate function given by the specific relative entropy of the observed law of 
words w.r.t. P. The quenched LDP, however, is not standard. The quenched LDP was obtained 
in Birkner [2] for the case where g has an exponentially bounded tail, and in Birkner, Greven 
and den Hollander [3] for the case where g has a polynomially decaying tail: 

log g{m) 

hm — = —a, q£ l.oo . (1.5) 

m^oo logm 

e(m)>o ° 

(No condition on the support of g is needed other than that it is infinite.) In the latter case, 
the quenched rate function turns out to be a sum of two terms, one being the annealed rate 
function, the other being proportional to the specific relative entropy of the observed law of 
letters w.r.t. v, obtained by concatenating the words and randomising the location of the origin. 
The proportionality constant equals a — 1 times the average word length. 

The goal of the present paper is to extend both LDP's to the situation where v is no longer 
i.i.d., but satisfies a mixing condition called summable variation, which will be defined in 
Section[3] In what follows, H{- | •) denotes specific relative entropy (see Dembo and Zeitouni 
Chapter 6, for the definition and key properties). 

Theorem 1.1 (Annealed LDP). If v has summable variation, then the family of probability 
laws P(i? n 6 ■), n G N, satisfies the LDP on 'P mv (^E z ) with rate n and with rate function 
jann. -pmv^z^ ^ [0, oo] given by the specific relative entropy 

I° aa (Q) = H(Q\P). (1.6) 
I ann is lower semi- continuous, has compact level sets, is affine, and has a unique zero atQ = P . 

Theorem 1.2 (Quenched LDP). If v has summable variation, then for v -a. a. X the family 
of conditional probability laws P(i? n G • | X), n G N, satisfies the LDP on 'P mv (£' z ) with 
rate n and with rate function 1^°; p inv (i£ z ) i— y [0, oo] given by the sum of specific relative 
entropies 

rv*(Q) = H(Q | P) + (a - l)m Q H(* Q \ u). (1.7) 
I que is lower semi- continuous, has compact level sets, is affine, and has a unique zero atQ = P . 

Theorem 1.3. Both LDPs remain valid when E is a Polish space. 

Remark: If tuq = oo, then the second term in (|1.7p is defined to be a — 1 times the truncation 
limit limtr-Kx) m [Q] tr -^(^ r [Q]tr I Mtr)> where tr is the operator that truncates all the words to 
length < tr. See Birkner, Greven and den Hollander |3] for details. 
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Remark: Both rate functions are the same as for the i.i.d. case, even though the reference 
laws P and v are no longer i.i.d. This lack of independence will require us to go through 
several approximation arguments. Both LDP's can be applied to the problem of pinning of a 
polymer chain at an interface carrying correlated disorder. This application, which is our main 
motivation for extending the LDP's, will be discussed in a future paper. 

1.3. Outline. In Section [2] we collect some basic facts, introduce the relevant mixing coeffi- 
cients, and define summable variation. We give examples where this mixing condition holds, 
respectively, fails. In Section [3] we prove the annealed LDP by applying a result from Orey 
and Pelikan |14| . In Section [4] we prove the quenched LDP by going over the proof in Birkner, 
Greven and den Hollander [3] for i.i.d. letter sequences and checking which parts have to be 
adapted. In Section [5] we extend the LDP's from finite E to Polish E by using the Dawson- 
Gartner projective limit LDP. 

2. Basic facts, mixing coefficients and summable variation 

2.1. Basic facts. Throughout the paper we abbreviate 

X( m . n ] = (X m+ i, ... , X n ), Y( m>n ] = (Y m+ i, Y n ), -oo < m < n < oo. (2.1) 

The associated sigma-algebra's are written as 

F(m,n] = a ( X (m,n])i @(m,n] = a (¥{m,n])- ( 2 - 2 ) 
Since X is no longer i.i.d., the distribution of a word in Y depends on the outcome of all 
the previous words. However, since the word lengths are still i.i.d., when we condition on the 
past of the word sequence only the past of the letter sequence is relevant, as is stated in the 
next lemma. 

Lemma 2.1. P(A \ G(~oo,o}) = ~P( A I F{-oo,o]) a - s - f or aU A G £(o,oo)- 
Proof. Fix r G N and y\, . . . ,y r G E, and pick A = {Y( 0jr i = J/(o, r ]}- Write 

~P(A | £(-00,0]) = P ( r (0,r] = IZ/|(0,r]> X (o,Y% =1 \vi\] = K (^(0,r-]) I £(-oo,0])> ( 2 - 3 ) 

where |yj| is the length of word yj. Since O"(r(o ;r ]) is independent of ^(-oo^], we have 

r 

P(^4 | a ( _oo,o]) = p (^(o,Er =1 | w |] = <y(o,r]) I 6(-oo,o])n*l). (2.4) 

1=1 

But X and r are independent as well, and so 

r 

P(A | e?(_oo,o]) = u { X (o,JZ r i=1 \yi\} = K (y(o,r]) I -^(-oo.o]) II e(\Vi\), ( 2 -5) 

8=1 

which yields the claim after we argue backwards. □ 
Write N = NU {0}. Let (u x -(-);x~ G E~ N °) be a regular version of v(- \ X(_ oo>0 ]), i.e., 

v{A)= j v x -(A)du(x-), iGJ (0l oo)' (2-6) 

From the regular conditional probabilities of v we obtain regular conditional probabilities of 
P as follows. 

Lemma 2.2. The collection (P y -(-),y~ G E~ m °) of probability laws on E^ defined by 

P V -(A) = J P(A\ Jtydv^y-) VAee M , (2.7) 
constitute a regular version of the conditional probability P(- j £?(_oo,o])- 
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Proof. For every y 6 £ P y -(-) defined in (|2,7p is a probability measure. It therefore is 

enough to prove (j2.7p for cylinder sets. Let m E N, (yi)i<i< m E and A = rii<i<m{^ = 
yj}. Then 



(2.8) 



/ P(A\J 7 z )di/ K ^- ) = d^ K{3/ -) l { x eK (A)} II ^(M) 

= V K {y-){X E k(A)) JJ g(\ yi \). 

l<i<m 

Since f^ dP(y~) = f E -n du(x~) u x -(-) = v(-), we have 

/ dP(y-) / P(A\Jv)to K ( jr) =v(X€K(A)) TT (\ yi \) = P(A), (2.9) 



which proves the claim. 



□ 



2.2. Mixing coefficients. We need the following mixing coefficients for letters and words: 
Definition 2.3. (a) For A 1 C -No and A 2 C N, let 

(2.10) 



(p(Ai,A 2 )= sup sup |logz^-(,4) - logz^-(A)| . 

x-,i-6£- N A ^A 2 - 
(x-) Al =(4-) Al " X -W>0 



(b) For A C N, let 



t/>(A) = sup^ sup |logPj,-(A) -logP r (A)|. 



(2.11) 



y C P _(A)>0 



The restrictions z^- (A) > and P^- (A) > are put in to avoid oo — oo. Nonetheless, (|2.10p 
and (|2.1ip may be infinite. Note that if Ai = 0, then the supremum in Definition 12.3( a) is 
taken over all x~,x~ E E~ N ° without any restriction ((x~ )a denotes the restriction of x~ to 
A). We will use the following abbreviations: 

ip(k,-)=tp((-k,0},-), k£N, p(0, •) = p(0, •), <p(;£) = ip(;(0J}), £€N. (2.12) 
Lemma 2.4. Let < m < n, y^^ E E n ~ m and A = {F( m , n ] = y( m ,n]}- For all y~ ,y~ E 

P y -(A)<E explM, (r m ,T m + ^ |y fe |l)lp r (A|T m ) . 



k=m+l 



Proof. Using Definition 12.3( a) , we have 
P ir (A)=E 



( ^"^T T xV" I, tl = K (y(m,n] 



< E 


exp < 


H 


= E 


exp < 





fc=m+l 



fc=m+l 



P fl -(A|T m ) 



(2.14) 

□ 
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Lemma 2.5. For all k <=N ,£ £N, 

e-i 

<p(k,£) < ¥>(*: + m), ( 2 - 15 ) 

m=0 

where (p(k) = <p(k, 1), A; 6 No- 

Proof. We show that, for all to E No and k, I € N, 

(f(m,k + £) < (p(m,k) + p(m + k,£), (2.16) 

which yields the claim via iteration. To prove (12. 16|) . pick x^fc+^j € E k+e and x~,x~ 6 
with [x~ )[_ r7 j ) o] — (^)[-m,0]j an< ^ consider the events 

-^(0,fc+£] = {^(0,fc+£] = ^(0,fe+^]}) ^(0,fc] = {^(0,k] = ^(O.fc]}) ^(k,k+£] = {X(k,k+£] = ^(fc,Jfc+£]}- 

(2.17) 

Estimate 

u x-(A(o,k+e\) = v x-i A (Q,k}) ^x-x {thk] i A (k,k+i]) 

< e^ v £ -(A m )e^ m+k ^ v x - X(0 JA (w] ) (2.18) 



where x xr u is the concatenation of x and X(o,fci- Insert this estimate into (|2.3p and take 
the supremum over X(o.fc+f] and - x ~>%~ to get (|2.16p . □ 

Note that /c i— > (/?(&) is non-increasing on No- 

2.3. Summable variation. The key mixing condition in our LDP's is summable variation: 

(SV) (p{n) < co. (2.19) 

n€No 

The term summable variation is borrowed from the theory of Gibbs measures, where logarithms 
of probabilities play the role of interaction potentials, and coefficients similar to our </?(n)'s are 
used to measure the absolute summability of these interaction potentials. 

(I) Random processes (with finite alphabet) that satisfy (SV) include i.i.d. processes (<p(n) = 
for all n £ No), Markov chains of order m (<p(0) < oo and <p(n) = for all n > to), and chains 
with complete connections whose one-letter forward conditional probabilities have summable 
variation. Ledrappier |12[ Example 2, Proposition 4] shows that such chains have a unique 
invariant measure and are Weak Bernoulli under (SV). Berbee [U Theorem 1.1] shows that 
they have a unique invariant measure and are Bernoulli when XmeN ex P[ — Em=i < P( rn )] = °°i 
a condition slightly weaker than (SV). (Uniqueness of the invariant measure has been proved 
more recently by Johansson and Oberg |10| and by Johansson, Oberg and Pollicott |11| under 
the even weaker condition XmeN ^P{ n ) 2 < °°-) Yet other examples satisfying (SV) include Ising 
spins labeled by Z with a ferromagnetic pair potential that has a sufficiently thin tail. 

(Ila) A class of random processes that fail to satisfy (SV) is the following. Let E = {0, 1}, and 
let p be any probability law on N such that p(£) ~ Cl~"' for some 7 > 2. Since X^eN &p(£) < °°i 
there exists a stationary Markov chain (A^^^z on No with the following transition probabili- 
ties: 



-or a 1 1 1 a \ ^2e>n+iPW r>/ A n i A \ p(n + l) 

P{A 1 = n + l\A Q = n)= ^ , P{A 1 = \ A = n) = — — -, n € N . 



(2.20) 
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The process (Xk)k£Z defined by = l{A k =o} ^ a ^ s to satisfy (SV). Indeed, pick n G N and 
x, x[n] G E~^° be such that x% = 1 for i G —No, x[n]i = for i G (— n, 0] and x[n]i = 1 for 
i € (— oo, —n]. Then 

<p(l) > log u x (Xi = 1) - log^M^ = 1) = logp(l) - log ( £ ( " + p |^ ) • ( 2 - 21 ) 

Since this lower bound holds for all n G N, we conclude by letting n — > oo that 93(1) = 00. 

(lib) Another class of random processes that fail to satisfy (SV) is random walk in random 
scenery. Let S = (S n ) n ^z be a simple random walk on Z d , d > 1, i.e., So = and S" n — <S n _i = 
X n with (X n ) n€ z i-i-d. random variables uniformly distributed on {e G ||e|| = 1}. Let 
£ = (?( 3; ))a;GZ ti De i-i-d- random variables taking the values and 1 with probability \ each, 
and define Z n = (X n ,£(S n )). Then Z = (Z n ) n£ z is stationary and ergodic, but not i.i.d. In 
den Hollander and Steif [9j Theorems 2.4 and 2.5] it is shown that Z is Weak Bernoulli if and 
only if d > 5. Since (SV) implies Weak Bernoulli (Ledrappier |12[ Proposition 4]), Z does not 
satisfy (SV) when 1 < d < 4. 

3. Annealed LDP 

The annealed LDP in Theorem 11.11 is a process-level LDP. Such LDP's were proven by 
Donsker and Varadhan [6l [7] for reference processes that are Markov or Gaussian. Orey |13] 
and Orey and Pelikan |14| gave a proof for ratio-mixing processes (see below), using the 
observation that any random process can be viewed as a Markov process by keeping track of 
its past. 

Proposition 3.1. (Orey and Pelikan |14[ Theorem 2.1]) Suppose that P has the following 
ratio-mixing property: 

(RM) There exists a non- decreasing function n 1— > m(n) such that 

(3 1) 

< m(n) < n, lim m(n)/n = 0, lim ip((m(n),n])/n = 0. ^ ' ' 

n— >oo n— >oo 

Then the family of probability laws P(R n G •), n G N 7 satisfies the LDP on 'P mv (E 1 ') with rate 
n and with rate function given by the specific relative entropy 

Q ^ H{Q I P) = / Q(dy~) [ Q y -\i{dy) log (^L^(y)) . (3.2) 

Jy-fzE-fo J y eE \dr y -\i J 

The specific relative entropy H (Q \ P) is defined to be infinite when Q y - \ i <^ P, y - |i fails on 
a set of y~'s with a strictly positive Q-measure. An alternative form of fj3 . 2 [) is 

H(Q I P) = / Q(dy-) H(Q y - (Y 1 G • ) | P^- (Y 1 G • )) . (3.3) 

Jy-£E- N 

The latter can be viewed as the specific relative entropy of the laws of two Markov processes, 
namely, the laws of the past processes Y* = (Y^'*) n( z^ with y( n )<* = (y( n_m )) me pj, n G N, 
when Y is distributed according to Q, respectively, P. The regular conditional probability 
laws (P y -(Yi G -),y~ G -E _N °) play the role of transition probabilities for Y*, and regularity 
translates into the Feller property. 

We are now ready to prove Theorem 11.11 

Proof. From Lemma 12.41 and the fact that t 1— > (p(0,£) is non-decreasing, we get P y -(A) < 
e^ ,aa ^ Py-(A). Hence Definition 12.3( b) gives ^((m, n]) < 93(0,00) for all < m < n. From 
Lemma 12.51 we get 

93(0,00) < ]T <p(n). (3.4) 

nGN 
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Hence, if (SV) holds, then (RM) holds for m(n) = 0, and so we can apply Proposition 13.11 □ 



4. Quenched LDP 

In Sections 14,1114,31 we prove several lemmas that are needed in Section |4T4] to give the proof 
of Theorem 11.21 This proof is an extension of the proof in |3] for i.i.d. v. We focus on those 
ingredients where the lack of independence of v requires modifications. 



4.1. Decoupling inequalities. Abbreviate 



C(ip) = exp 



nSN 



< OO. 



(4.1) 



Lemma 4.1. For all x , x G E N ° , A G ^7o,oo) an d i£N, 

C(y>)-V(A) < u x - (A) < C^)^-(A), 

C7(<^)- 1 ^-(A) < u{A | X { _ nfi] = X(_ nQ] ) < C(<p)v £ -(A). 



(4.2) 
(4.3) 



Proof. To prove (|42|) . pick k G N and A G ^(o,fc)- If ^-(A) = then v x -{A) = as well 
because y(A;) < oo and there is nothing to prove, so we can assume v x - (A) > 0. Then, by the 
definition of (f(k) and Lemma 12.51 



(4.4) 



e -C(ip) < e -<p(0,k) < v x~ ( A ) < e v(0,k) < e C{ip) 

~ u x - (A) ~ 



To prove f|4.3[) . write 
v{A | X { _ n>0] = x 



(-71,0] > 



i/({x ( _ n)0 ] = x ( _ nj0] } n A) 



'(^(-n,0] 



X 



(-71,0]' 



/5=-GE- N ) ({-*((),!»] = X (-n,0]J' n ^ M ) 



/x-gE-Wo ^O 5 )^~(^(0,n] 



71,01 ) 



'(-n,0] 



Jj-eE-No <M X )^"( X (0,n] 



^-n,0])^-,, r _„, 0] (^) 



(4.5) 



< 



/x-eE-Ko ^"( X (0,n] = ^_„, ]) 

/s-GE" N <M^~) ( X (0,n] = X (l„, ]) eC{V:,) ( A ) 
/x-eE~«0 d "(s~) ^~( X (0,n] = ^_ n ,o]) 

= e c ^-(A), 

where the inequality uses (|4.2p . The reverse inequality is obtained in a similar manner. □ 



Lemma 4.2. Lei m G N, and Zei («i 



> ■ ■ ■ j 'm/j 



(iij • • • ,3m) be two collections of integers satis- 



fying h < ji < i 2 < j 2 < • • • < i m -l < jm-l <im< jm- For 1 < k < m, let A k G J r ( ik j k ] and 
Pk = u(Ak). Suppose that v satisfies condition (SV). Then 

y{^i< k < m A k ) <C^) m ^ J] Pk- (4-6) 

l<k<m 
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Proof. We give the proof for m = 2. The general case can be handled by induction. Let 
k < jl < h < 32, Ai C E jl ~ h and A 2 C E j2 ~ i2 . For all x~ € E~ N °, 

u i x (ii,h] G Ai,X( i2 j 2] e A 2 ) 
= Yl u ( X (h,h] = x (h,h]' X (i 2 ,j2] = x (i 2 ,j 2 ]) 

= U ( X (ii-jl,0] = x (ii,ji]> X (i2-ji,j2-ji] = X (i2,J2]) 

x '(n.Jil eAl 

x (i2,h\ &A2 

= Y U ( X (ii-hfi] = X {h,h\) U ( X (i2-ji,j2-ji] = X (i2,j2] I X (h-ji,0] = X (ii,ji}) ( 4 - ? ) 
x '(n.Jil eAl 

x (i 2 J2] eA2 

< C(<P) U ( X (h~jl,0] = X (il,jl}) V x~ ( X (i2-jl,j2-jl] = X (i2,j 2 ]) 

x (i 2 ,i2l eA2 

= C{if)pi Yl U x-{ X {i2-h,h-h] = X (i2,j2])l 
x d2,n] £A2 

where the inequality uses fj4 . 3[) in Lemma 14.11 Averaging x~ w.r.t. u, we get 

v( x (h,n] e Ai, X (i2 j2] E A 2 ) < C{cp) PlP2 . (4.8) 

□ 

4.2. Successive occurrences of patterns. 

Lemma 4.3. Fix m € N and let A 6 -7~(o,m] ^ e swc ^ ^(A) > 0. Let (cr n ) n ^z be defined by 

do = inf {k > : 6 k X € A} + m, 
W € N, <t € = inf{fc > a t -i : 6 k X £ A} + m, (4.9) 
W € -N, cr^ = sup{£; < a e+1 - 2m : 9 k X € A} + m. 
If v satisfies condition (SV), then v-a.s., 

limsupi V logfo - o*_i] < C^^tlogo-i]. (4.10) 

n ^°° n i<e< n 

Proof. The strategy of proof consists in writing the sum in (|4.10p as an additive functional of 
an ergodic process and to use Birkhoff 's ergodic theorem. First note that the sequence (<7n)neN 
cuts blocks out of the letter sequence X, which we denote by 

B n = x (an _ uan] eE, nen. (4.11) 

Each of these blocks belongs to the following subset of words: 

E A = {y € E: \y\ > m; V < k < \y\ - m: y^ k+m ] $ A; y(\ y \- m ,\y\\ € A). (4.12) 

Define the process B* = (£>*) ng pj m E~^° by putting B* = -X"(_oo,ct„] ■ This process is Mar- 
kovian and its transition kernel is given by 

P A (x) = P(B* +1 = x | B* = x) = Hx=x,y}M x (o,\ v \] = V), x , x ^ E~ No . (4.13) 

y€E A 

For the collection (P A (-),x £ E~ N °) to be a proper transition kernel, a\ must be i^-a.s. finite 
for all x £ E~^°. Since u(A) > 0, we know from the Recurrence Theorem in Halmos [8] 
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that o"i is u-a,.s. finite. Since v and (i / x ) X £E- m o are equivalent under condition (SV) (note that 
C(ip)~ l v{-) < v x {~) < C{ip)u{-) as a consequence of (|4,2p in Lemma l4,ip . o\ is z/^-a.s. finite 
for all x € E~^°. Since (with a slight abuse of notation) the B*'s are also in E~^° x Ea, we 
can write 

log[ff/ - <7/_i] = Yl log |tt(^)|, (4.14) 
l<t<n l<t<n 

where % is defined by % : (u, v) € E~^° x Ea >— > v. We next apply Birkhoff 's ergodic theorem to 
the sum in the right-hand side, i.e., to the process B*. This process has a stationary distribution, 
which we denote by Pa- It is easy to check that Pa is the law of -XV— 00,0-0] conditional on the 
event n^ g _N {<7^ > — oo}, which has probability one according to the Recurrence Theorem. 
Again using f j4 . 2 [) in Lemma 14. 1| we see that for all sets A and B that are measurable w.r.t. 
^f-oo.o]) and ^(^(o.oo))' respectively, 

C(^)- 1 P a {A)Pa{B) < P A (A nB)< C(v)P A {A)P A (B). (4.15) 
Therefore Pa is Weak Bernoulli (Ledrappier [12]), and hence is ergodic. Thus, we have 

lim - V] log[ae-ae- 1 }=E P A(log[ai-ao\). (4.16) 

l<£<n 

Moreover, for all x~ £ E~ N °, 

S P A(log[ai - a }) = J E Ux _ (log[ai - a ])dP A (x~) < C(<p)E Vj) _ (log^ - a }), (4.17) 
which gives E p a (log [o~i — o"o]) < C((p)E u (\og[o~i — <To]) and completes the proof. □ 
4.3. Decomposition of relative entropy. 

Lemma 4.4. Suppose that ip(0) < oo. Then, for all Q G V inv (E z ), 

H(Q | P) = -H(Q) - EQllogQin)} - mQE^ilogvx^ ^X!)}, 
H{^Q | v) = -H(* Q ) - ^[log^^ ^Xx)]. ( ' j 

Proof. To get the first relation, write H(Q | P) = -H(Q) - Eq [log Py, 0] (ii)], 

E g pogPy ( _ 0Oiq (y 1 )] = EQpoge(ri)] +E Q [log^ ( _ oo , 0] (X (0>n] )] (4.19) 

and (recall dOl) ) 

pri-l 

logz/x ( _ 00ife] (^+i; 

.fc=0 



E Q[ 1 og^ ( _ 00 ,o ] (^(o,n])] = E C 



m Q E^[log I /x ( _ oo , 0] (X 1 )] ) (4.20) 



where we use the abbreviation v x ~(x\) = v x -(X\ = x\), A C N. The second relation follows 
in a similar manner. □ 

All terms in the right-hand side of (|4.18p . except possibly H{Q) 1 are finite because E is finite, 
q satisfies (|1.5p . and f(0) < oo. 

Lemma 4.5. If v satisfies condition (SV), then for all Q E -p™,<*S(E N ), 

lim -logv(X {0jTn] ) = m Q E iBQ [logux { _ oofi] (X 1 )] Q - a.s. (4.21) 
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Proof. First observe that f |4 . 3 [) in Lemma [4.11 gives 

C&r^x^X^) < u(X (0)Tn] ) < C^)u X{ _ xM {X { ^ Tn] ). (4.22) 

Next write 

T n -\ 

l og^x ( _ xM (X {0jTn] )=J2 lo S^x ( _ 00!k] (X k+1 )=J2 Yl lo ^"x i _ 00ik] (X k+1 ). (4.23) 

k=0 i=0 k=Ti 

Use (|4.23p and the ergodicity of Q to obtain, for Q-a.s. Y, 

"Tl-l 

logu X{ _ xk] (X k+1 ) 

.k=0 

(4.24) 

Combine (j4~22ti4~24|) to get the claim. □ 
4.4. Proof of quenched LDP. We are now ready to give the proof of Theorem 11.21 



lim - log (X (0)Tn] ) = E Q 



m 



Q E <f Q [ lo S ^(-oo,o] 



Proof. The proof is an extension of the proof in |3] for i.i.d. v. Since the latter is rather long, it 
is not possible to repeat all the ingredients here. Below we restrict ourselves to indicating the 
necessary modifications, which are based on the results in Sections I4.1H4.3I We leave it to the 
reader to go over the full proof in [3] and check that, indeed, these are the only modifications 
needed. 

Decomposition of relative entropies. Replace Eq.(1.25) and Eq.(1.26) of [3] by the relations in 
Lemma 14.41 



Upper bound. Fix E\,5\ > 0. Replace the fourth line in the definition of the event defined in 
Eq. (3.4) of |3] by 

|-^logi/(Z ( o, TM] ) € m Q E^ Q [log z/x^ojpTx)] + [-£i,£i]J . (4.25) 

By Lemma [4. 51 the event in Eq.(3.4) of |3] has probability at least 1 — <5i/4 for M large enough. 
Parts 3.2 and 3.3 of [3] are unchanged. The next (harmless) modification is in Eq. (3.39) of [3], 
which has to be replaced by 

P (n!< fe < n {^ fc = a k })< [C{y)pfi<*<r> ak , (4.26) 

where A k is the indicator defined in Eq.(3.36) and Eq.(3.37) of [3], and a k £ {0,1}. This 
relation can be proved via Lemma 14.21 

Lower Bound. One modification is needed to go from Eq.(4.7) to Eq.(4.8) of [3], since the 
increments of the o~^ M \ f £ N, defined in Eq.(4.6) of |3] are no longer i.i.d. Use Lemma 14.31 
instead. □ 



5. Extension to Polish spaces 

In this section we prove Theorem ll,3| i.e., we extend the LDP's in Theorems II. lHl~2l from 
a finite letter space to a Polish letter space. We first prove the LDP's for a sequence of coarse- 
grained finite letter spaces associated with a sequence of nested finite partitions of the Polish 
letter space. After that we apply the Daws on- Gartner projective limit LDP (see Dembo and 
Zeitouni [3], Lemma 4.6.1). A somewhat delicate point is that (SV) for the full process does 
not necessarily imply (SV) for the coarse-grained process. Indeed, the first supremum in (|2.10p 
decreases under coarse-graining while the second supremum increases. The way out is to use 
(SV) for the full process to prove the decoupling inequalities in Section 14.11 for the coarse- 
grained process. 
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Let X = be a stationary process on a Polish space (E, d), with (y x - (•), x~ G E~^°) 

a regular version of the conditional probability v{- | X^^q}) satisfying condition (SV), i.e., 



C(ip) = exp 



neN 



< 00, 



where 



with 



ip(n) 



sup sup I log v x - (A) - log v & - (A) I 

x-,x-£E- m 0: Ae.Fi: 
d(x-,x~)<2~ n » X -( A )>° 



d(x-,x~) = 2 ^ (fc+1) [1 M 
fceN 



«)]■ 



(5.1) 



(5.2) 



(5.3) 



We assume that, for any x , x G 12 N °, the measures = i/ x -(Xi G •) and v x -\i = 

v x -{X\ G • ) are equivalent, so that the Radon-Nikodym derivative dv x - |i/d^- |i exists and 



sup [ log v x - (A) - log v x - (A) 

Ae.Fi : 



supess 



log 



dv„- 



di/^-li 



(5.4) 



leading to the alternative definition 



(£>(n) = sup supess 

x~,x-&E- m 0- 
d(x~ ,&-)<2- n 



log 



du, 



x-U 



dux. 



(5.5) 



Similarly as in Section 12. 3[ we note that (SV) holds for i.i.d. processes, for Markov chains of 
finite order with 97(0) < 00, and a subclass of chains with complete connections whose letter 
space is countable (Berbee |T]). Other examples are rotators that are labelled by Z, take values 
in the unit circle, and interact with each other according to a Hamiltonian with long-range 
potentials that have a sufficiently thin tail. 



The following lemma generalizes (14. 2ft in Lemma 14.11 
Lemma 5.1. For all x~ ,x~ G E~ N ° and A G ^"(0,00); 

C%rV04) < Vx-(A) < C(<p)v x -(A). 
Proof. For all x~,x~ G E~ N ° and n G N, 



dv„ 



dv* 



(xi 



, . . . , u, n j 



dv„- 1 1 , , du. 
1 -(H) x 



dz/ T 



x~xill/ \ w w ut/ x-xi-x„_i |1 
(x 2 ) X • • • X 



d^x-xjl 



d^x-|i 

< exp[p(0) + ip(l) + --- + <p(n-l)]< C(<p) 



dv x x\-"X n —l |l 



fan) 



which proves the claim. 



(5.6) 

(5.7) 

(5.8) 
□ 



Let S c = {Ei, . . . , E c }, c G N, be a finite partition of E. Identify with {1, . . . , c} z . Let 
= {X^)k£i on £^ be the coarse-graining of X on .E z defined by 



X 



(c) 



i=l 



Write J"' 



(c) 



(0,00) 



o^X^^). The following lemma generalizes f|4. 3[) in Lemma |4~T 



(5.9) 
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Lemma 5.2. For all x~ G E~ n ° , c G N, i~,j~ G {1, ... ,c}" N °, A G -T^ooo) anc? m ' n e N ' 

C(^)- 1 ^-^) <i/(A| X ( ^ n0] = i-_ nj0] ) < C(^)^-(A), (5.10) 

CM- 2 *, (a i = j ( - mi0] ) < */(A i x®^ = ii_ n>0] ) 

<C(v) 2 vU\X^ mfi] =j-_ mfi] ), (5.11) 



C^y\ [A | X ( ^ 0] = ^_ nj0] ) < u(A) < C{^)v [A I X^ nfi] = i^ n0] ) , (5.12) 
provided that the events on which we condition have positive probability. 

Proof. Note that (|5.1ip follows by applying (|5.10p twice, while (|5.12p follows by integrating 
x~ w.r.t. v in (|5.10p . Therefore it suffices to prove (|5.10p . To that end write 



J E . No v~ x - ({xg B] = r { _ nA } n e~ n A) du(x-) 



u[A\ X\ c) nl = %7 nl ) = - - - - ^"' UJ ^ : — . (5.13) 

The integral in the numerator equals 

d^-(3(O,n])lr,C«0 ■- ,U X - (A) dz/(x _ ), (5.14) 

from which the claim follows via Lemma 15.11 □ 



In what follows we need the notion of conditional local absolute continuity (which is weaker 
than absolute continuity). 

Definition 5.3. Let F be a Polish space equipped with its Borel a-algebra, and let A, \i be 
two stationary probability measures on F z with respective regular conditional probabilities 
(X x -, x~ G F~ N °) and (/J, x -, x~ G F~^°). The law X is said to be conditionally locally ab- 
solutely continuous w.r.t. to the law fj, (written as A <C C ond A 4 ) when, for A-a.a. x~ and all 
n G N, X x - \ n is absolutely continuous w.r.t. to [i x - \ n (written as X x - \ n <C fi x - \ n ), where X x - \ n 
and [i x - \ n are the marginal laws on the first n coordinates. 

Note that because F is Polish the set {x~ G F~^° : X x -\ n <C ^ x -\n} is measurable. We are 
now ready to prove Theorem 11.31 

Proof. We need to prove both the annealed LDP and the quenched LDP. 

Annealed LDP. Lemma \5 . 1 1 shows that under condition (SV) Lemmas 12, 4H2,5l carrv over from 
finite letters to Polish letters. Therefore the ratio-mixing property of Orey and Pelikan |14] 
again yields the annealed LDP. 

Quenched LDP. The proof comes in 4 steps. 

1. We first use Lemmas 15 . 1H5 . 2 1 to show that Lemmas l4.2H4.5l carry over to the coarse-grained 
process X^ defined in (|5.9p for every c G N. This is straightforward, except that Lemma 14.51 
carries over to Q G V inv ' erg ((£ c ) Z ) only when ^S>q <C con d 17 i where z/ c ) denotes the law of 
iW. We will see in Step 4 below that, because H(^/q \ v^) = oo when <C CO nd fails, 
this restriction does not affect the LDP. 

2. To prove the restricted version of Lemma 14. 5 [ let Q G V mv ' eTS ((£ c ) Z ) be such that ^/q ^ C ond f • 
Using the notation introduced below (|4.20p . we know from Lemma 15.21 (by letting n — > oo in 

Eq. (I5.12p and using the Martingale Convergence Theorem) that, for z/ c )-a.a. AT^^g,, 
"(cM-^xM^iX^) < v{X$ n) ) < C(^ x(c)i ^ 0] (X^ n) ) | ^) = 1. (5.15) 
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(c) 

By conditional local absolute continuity we have, for *$>q-&.&. X^^^, 
This implies that, for \&Q-a.a. -X'r_ 00 o]' 



x\ c) nl ] = 1. 
(-00,0]; 



(5.16) 



which settles the restricted version of Lemma 14,51 

3. By the same argument as in Section T4.4[ we now know that the quenched LDP holds for 

for all c € N (see Step 4 below for comments). Picking for S c = {E±, . . . , E c }, c G N, a nested 
sequence of finite partitions of E as in |3l Section 8], we conclude from the Dawson-Gartner 
projective limit LDP that the quenched LDP also holds for X, with rate function 

r* ue (Q) = sup/^ uc (Q (c) ), QGP inv (£ z ), (5.18) 

ceN 

where is the coarse-graining of Q, and Ic Ue is the coarse-grained rate function. The argu- 
ment in O Section 8] shows that the supremum equals the rate function given in (II. 7ft . i.e., the 
coarse-grained relative entropies converge to the full relative entropies as c — > oo. (Deuschel 
and Stroock j5j Lemma 4.4.15] implies that the coarse-grained relative entropies are monotone 
in c.) 

4. To obtain the quenched LDP, we must prove Eq.(3.1) and Eq.(4.1) in |3] for the coarse- 
grained process. In Steps 1-3 this has already been achieved for Q G P mv,fin ((£ c ) z ) with 

^cond f ■ Eq.(4.1) in [3] trivially carries over when the latter restriction fails, but for 
Eq.(3.1) an additional argument is needed. We must show that there exists a sequence (0fc(Q))fceN 
of shrinking open neighborhoods of Q such that 

lim limsup— logP( c) (P$ G O k {Q) | X (c) ) = -oo, (5.19) 

where P( c ) denotes the coarse-graining of P. This can be done via an annealed estimate. Indeed, 
for z»-a.a. X^ c \ 

limsup lo g p( c )(P$ G O k (Q) | X^) < limsup ^ logP^P^ G fc (Q)) 

iV-i>oo -<V AT->oo -<v ^ 20) 

< - inf H(Q' | pW), 

where the last inequality follows from the annealed LDP. (This needs justification, since the 
annealed LDP was proved under condition (SV), which is not necessarily satisfied for i/( c \ 
However, by Lemma 15. 2[ a decoupling inequality holds for a. a. pairs of coarse-grained pasts. 
Therefore there must be a regular conditional probability of z/ c ) satisfying Orey and Pelikan's 
ratio-mixing condition.) A sequence (Cfc(Q))fceN satisfying (|5.19p is easily obtained by letting 
k — » oo and using the lower semi-continuity of Q' >— > H(Q' \ P^) together with the fact that 
H(Q | PW) > m Q H(^ Q | i/M) = oo (see [3J Eqs.(l. 30-1.32)]). □ 
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